# Read PDFs Aloud with AI Voice (Text-to-Speech) Annolid includes an integrated PDF viewer with text-to-speech (TTS), so you can open a PDF and have selected text (or whole paragraphs) read aloud. ## Prerequisites 1) Install Annolid (see `install.md`). 2) Install PDF support: ```bash pip install pymupdf ``` 3) Install a TTS backend (pick one): - Recommended (offline, higher-quality “AI voice”): Kokoro (ONNX) ```bash pip install kokoro-onnx onnxruntime gdown ``` - Pocket TTS (very lightweight, CPU-only runtime, voices such as `alba`, `marius`, `javert`, `jean`, `fantine`, `cosette`, `eponine`, and `azelma`) ```bash pip install pocket-tts ``` Set `Engine = Pocket`, choose one of the built-in voices, or type a custom voice ID / prompt path. If you have a short WAV prompt of the desired voice, specify it in the “Pocket prompt” field to clone that tone. Use the `Pocket speed` control to speed up or slow down the generated speech (0.5–2.0×). (Optional) You can also install via `pip install annolid[pocket_tts]` so the dependency is available automatically. - Voice cloning (offline, uses a short voice prompt): Chatterbox Turbo (ONNX) ```bash pip install onnxruntime soundfile ``` Then select `Engine = Chatterbox` and choose a voice prompt audio file in the `PDF Speech` dock (or edit `~/.annolid/tts_settings.json`). - Language packs for Kokoro when you want Chinese or Japanese voices: ```bash pip install misaki[zh] # enables Mandarin (e.g., voice zf_001) pip install misaki[ja] # enables Japanese (e.g., voice jf_alpha) ``` - Fallback (online, simpler): Google TTS ```bash pip install gTTS pydub ``` `pydub` needs `ffmpeg` available on your system. ## Open a PDF in Annolid 1) Launch the GUI: ```bash annolid ``` 2) Go to `File` → `Open PDF...` and pick a `.pdf`. Annolid switches into PDF view and shows these docks (typically on the right): - `PDF Speech` (voice / language / speed) - `PDF Controls` (page + zoom) - `PDF Reader` (click-to-read mode) ## Option A: Speak a selection (fastest) This works in both the fallback viewer (image + text panel) and the PDF.js viewer. 1) Select some text (either in the page text panel, or directly on the PDF page). 2) Right-click → `Speak selection`. ## Option B: Click-to-read paragraphs (PDF.js reader mode) This reads full paragraphs/sentences starting from where you click. 1) In the `PDF Reader` dock, enable `Use PDF.js (required for reader)`. 2) Keep `Enable click-to-read` turned on. 3) Click a paragraph in the PDF page to start reading. 4) Use `Pause/Resume`, `Stop`, `Prev`, `Next` in the same dock. If the reader says it’s unavailable, install QtWebEngine (`pyqtwebengine` in conda, or `PyQtWebEngine` via pip) and restart Annolid. ## Change voice, language, and speed Use the `PDF Speech` dock to set: - `Voice` (example: `af_sarah`) - `Voice` (Chinese): `zf_001` (requires `misaki[zh]`) - `Voice` (Japanese): `jf_alpha` (requires `misaki[ja]`) - `Language` (example: `en-us`) - `Speed` (0.5–2.0) These settings persist in `~/.annolid/tts_settings.json`. ## Troubleshooting - **“PyMuPDF Required” dialog**: run `pip install pymupdf`. - **No audio output**: - Make sure `ANNOLID_DISABLE_AUDIO` is not set. - On Linux servers/containers, ensure an audio device is present (or use a desktop machine). - **First Kokoro run is slow**: Annolid downloads model files into `~/.annolid/kokoro` the first time. - **gTTS fails**: it requires internet access; also ensure `ffmpeg` is installed for `pydub`.